Global burden and epidemiological prediction of polycystic ovary syndrome from 1990 to 2019: A systematic analysis from the Global Burden of Disease Study 2019

Objective To comprehensively assess the global, regional and national burden of polycystic ovary syndrome (PCOS) in incidence, prevalence, and years lived with disability (DLYs) based on the Global Burden of Disease Study (GBD) 2019. Methods This was a cross-sectional descriptive study. Data on PCOS incidence, prevalence, and DLYs from 1990 to 2019 were obtained from the GBD study 2019. According to the commonwealth income, WHO region, and the sociodemographic index, the estimates were demonstrated along with the estimated annual percentage change (EAPC). The EAPC data were analyzed by four levels of hierarchical clustering and displayed in the world map. The Autoregressive Integrated Moving Average (ARIMA) and Bayesian age-period-cohort (BAPC) model was used to predict the PCOS burden in the next 20 years. Results From 1990 to 2019, the number of PCOS incidence in one year increased from 1.4 million in 1990 to 2.1 million in 2019 (54.3%). Only the EAPC estimates of incidence in the Region of the Americas decreased, and their aged-standardized incidence rate (ASIR) values were the highest in 1990 and 2019. There was no significant correlation between human development index (HDI) and EAPC. However, when HDI < 0.7, EAPC of incidence and prevalence was positively correlated with HDI, and when HDI > 0.7, EAPC of incidence and prevalence was negatively correlated with HDI. Countries with the middle level HDI have the highest increasing trend of ASIR and age-standardized prevalence rate (ASPR). The 10 to 19 years old group had the highest incidence counts of PCOS globally. Besides, the ARIMA and BAPC model showed the consistent increasing trend of the burden of PCOS. Conclusion In order to better promote the early diagnosis and treatment, expert consensus and diagnosis criteria should be formulated according to the characteristics of different ethnic groups or regions. It is necessary to emphasize the early screening and actively develop targeted drugs for PCOS.


Introduction
Polycystic ovary syndrome (PCOS), characterized by chronic anovulation, hyperandrogenism, and insulin resistance, is one of the most common reproductive endocrinopathy affecting approximately 8% to 18% reproductive-aged women [1].In addition to being an important cause of infertility, it is also a risk factor for many metabolic diseases, such as type 2 diabetes mellitus, cardiovascular continuum, and endometrial carcinoma [2,3].In 2019, 66 million people worldwide suffered from PCOS [4].The incidence and prevalence rates of PCOS are discrepant according to age and regional differences.But in general, the prevalence and incidence rates of PCOS are continuously increasing [5].By 2020, the economic burden of PCOS was assessed as 8 billion dollars per year [6,7].However, up to date, no specific drug has been approved by the US FDA nor the European Medicines Agency [8].The prescriptions given to PCOS patients, such as metformin, letrozole, clomiphene, and oral contraceptives, are off-label and symptom-oriented [9].In order to provide foundation for the government to make decision on medical resource allocation and underlie reference for clinical research, it is necessary to explore the epidemiological characteristics of PCOS comprehensively.
The Global Burden of Disease (GBD) study is a comprehensive research project that aims to quantify the impact of various diseases, injuries, and risk factors on global health [10].Previous investigation found that age and region are key factors in the epidemiological manifestations of PCOS.In 2019, globally, the 40 to 44 age group represented the highest prevalence, the 15 to 19 age group represented the highest incidence, and the 25 to 29 age group showed the highest number of years lived with disability (YLD).The numerical analysis of age standardization demonstrated that high-income Asia-Pacific, Australasia, and Western Europe are the regions with the largest disease burden of PCOS [4].A GBD study of PCOS within Europe found the prevalence rate in the Czech Republic was 460.6 (per 100000), while in Sweden it was only 34.1 (per 100000) [11].These estimates indicated the possibility of discovering new etiological determinants of PCOS from populationgeography-society characteristics.However, the current GBD research on PCOS is relatively limited.Only one study based on GBD 2019 comprehensively analyzed the data [4], but no more in-depth data processing was conducted, such as the estimated annual percentage change (EAPC), and its correlation with the age-standardized rate (ASR) and human development index (HDI).To date, there is no research to model and evaluate the future burden of PCOS.
Herein, we retrieved information from the GBD 2019 study and conducted a profundity analysis to comprehensively display the global, regional, and national burden of PCOS.In addition to analyzing the prevalence, incidence, and YLDs, we further evaluated EAPC and its heterogeneity sources.The Autoregressive Integrated Moving Average (ARIMA) and Bayesian age-period-cohort (BAPC) model was used to assess the burden of PCOS in the next 20 years.

Data sources
Annual cases and percentages of PCOS from 1990 to 2019, by region and country were obtained from the Global Health Data Exchange (GHDx) query tool (https://vizhub.healthdata.org/gbd-results)[12].The DisMod-MR 2.1, which is a Bayesian meta-regression tool, was used to model the incidence and prevalence.YLDs were calculated by the prevalence estimates multiplied by disability weights for mutually exclusive sequelae of diseases [13].95% uncertainty intervals (UIs) were included with the estimates.The characteristics of GBD study versions and detailed steps of using this database have been described in previous studies [12].

Case definitions
PCOS is defined as a common endocrine and metabolic disorder among reproductive-aged women, characterized by the presence of polycystic ovaries, hyperandrogenism, and menstrual dysfunction.The diagnosis of PCOS is often made using the Rotterdam criteria, which require the presence of at least two of the following: oligo-ovulation or anovulation, hyperandrogenism (clinical or biochemical), and polycystic ovaries on ultrasound.The GBD database categorizes PCOS as a non-communicable disease and includes it in the broader category of gynecological diseases.In GBD 2019, the diagnosis of PCOS can be made by any of the following approaches: Rotterdam criteria, NIH criteria, and the Androgen Excess and PCOS Society definition [4,14].

Measures of burden
From 1990 to 2019, the prevalence and incidence of PCOS in 204 countries and territories were shown on the world map, and the global distribution of aged-standardized incidence rate (ASIR) and age-standardized prevalence rate (ASPR) in 2019 was plotted.EAPC is a metric for measuring trends, commonly used to describe the rate of increase or decrease of a specific variable over a certain period of time.EAPC was calculated and displayed by fitting the regression line to the natural logarithm of the rates [15].Estimates were classified according to the commonwealth income, WHO region, and the sociodemographic index (SDI).

Cluster analysis and correlation analysis
Hierarchical Clustering Analysis is a clustering algorithm used to partition a set of data into multiple distinct clusters.Unlike other clustering algorithms, such as K-means, Hierarchical Clustering Analysis does not require the number of clusters to be specified in advance.Instead, it merges similar samples into a cluster based on the distance between each pair of samples until all samples are merged.The results were represented using a dendrogram, in which each leaf node represents a sample, and each internal node represents the merging of clusters.The EAPC of incidence and prevalence of the 204 countries and territories were clustered into 4 categories (decrease or tiny increase, minor increase, stable increase, and significant increase).
Correlation analysis is a statistical method used to study the linear relationship between two variables by calculating the correlation coefficient.The correlation coefficient is a standardized measure that describes the strength and direction of the linear relationship between two variables.The cor.test function in R was used to calculate the correlation coefficients between EAPC, ASIR, ASPR, and HDI.The results were illustrated in the form of a scatter plot combined with a trend line.

ARIMA model and BAPC model for forecasting (2020-2042)
The ARIMA model was applied to predict the prevalence and incidence rate of PCOS.The core of ARIMA is to difference the time series to transform the non-stationary series into a stationary one and then model the stationary series.The ARIMA model includes three main parameters: p, d, and q.Herein, p represents the order of the autoregressive term, d represents the order of differencing, and q represents the order of the moving average term.Autocorrelation function (ACF) and partial autocorrelation function (PACF) were used to determine the parameters.The forecast and tseries packages were used for ARIMA model forecasting and visualization.
In addition to the ARIMA model, the BAPC model was also applied to epidemiology prediction.This statistical model based on Bayesian statistical theory is used to analyze and explain the trends of individual attributes in the population with respect to age, period, and birth cohort.Unlike traditional classical statistical models, the BAPC model not only considers the influence of age and period but also models the birth cohort as a factor, providing a better understanding and description of the impact of birth cohort.Moreover, the BAPC model can also consider different prior distributions and hyperparameters, providing more flexible and personalized model selection and fitting methods.According to the standard age structure in GBD and the predicted population data of WHO, the prevalence and incidence of PCOS in the next 20 years were predicted with the BAPC model.The BAPC and INLA packages were used for BAPC model forecasting and visualization.

Statistical analysis
Counts and ASR (per 100,000) of prevalence, incidence, and YLDs were used as the assessment of the burden of PCOS with 95% UIs.All statistical analyses and visualization were conducted by R (version 4.2.2).The detailed information of packages used in this study was listed in S1 Fig in S1 File.P-value < 0.05 was considered statistically significant.

Global, regional and national level
From 1990 to 2019, there was an increase of 32 million PCOS patients worldwide.The incidence of PCOS in one year increased from 1.4 million in 1990 to 2.1 million in 2019.The YLDs increased from 0.3 million to 0.6 million.The EAPC estimates of prevalence, incidence, and YLDs were 0.84, 0.85, and 0.83, respectively (Fig 1).
From the WHO region classification, the region which had the highest incidence in 1990 was the Western Pacific Region, while in 2019, the South-East Asia Region showed the highest incidence.Although only the EAPC estimates of the Region of the Americas decreased, their ASIR values were the highest in 1990 and 2019.The EAPC of incidence estimates of South-East Asia region was the highest, and the ASIR of other regions did not decrease.Similar to the incidence data, EAPC of prevalence estimates of the Americas region was the only one that decreased, and the ASPR estimates were the highest.The counts of prevalence of Western Pacific Region were the highest in 1990 and 2019.The regional trends of YLDs were consistent with the incidence rate.ASR were the highest in Region of the Americas in 1990 and 2019, while the EAPC estimates were highest in South-East Asia Region (Fig 1).
From the region classification of SDI, the Middle SDI regions had the highest incidence counts in 1990 and 2019, while the High SDI regions showed the highest ASIR estimates in 1990 and 2019, only the incidence EAPC estimates in the High SDI regions were negative.In 1990, the High SDI regions had the highest number of prevalence, while in 2019, the counts of prevalence in this region ranked second, and the Middle SDI regions had the highest amount of prevalence.The prevalence estimates of EAPC in each region classified according to SDI were positive, and the highest was the Low-middle SDI regions.In 1990, the estimated counts of YLDs were the highest in the High SDI regions, while in 2019, the highest counts were in the Middle SDI regions.EAPC estimates of YLDs were all positive, and the lowest were the High SDI regions (Fig 1).
From the region classification of Commonwealth Income, the counts of incidence, prevalence, and YLDs in the Commonwealth Middle Income region were the highest both in 1990 and 2019, while the ASR estimates of these three were the highest in the Commonwealth High Income region.The EAPC estimates of the three values were also the highest in the Commonwealth Middle Income region (Fig 1 , Table 1).
In 2019, among the 204 countries and territories, Albania, Bosnia and Herzegovina, North Macedonia, and Serbia had the lowest ASIR, Italy, Japan, and New Zealand had the highest ASIR.Similar to the incidence data, these four countries with the lowest ASIR had the lowest ASPR.In addition to Italy, Japan, and New Zealand, Australia and Malaysia showed particularly high ASPR.The analysis of changes in prevalence from 1990 to 2019 revealed that Italy, Japan, Latvia, Bermuda, Bulgaria, Northern Mariana Islands, and Lithuania all decreased, and Equatorial Guinea, Qatar, United Arab Emirates and Maldives increased the most.Incidence of 34 countries and territories decreased, with the largest increased scale in Japan (-35%) and the largest decreased scale in Equatorial Guinea (749%).There were seven countries or   Hierarchical clustering of incidence and prevalence EAPC showed that 17 countries were divided into decrease or tiny increase group, 4 countries were divided into minor increase group, 117 countries were divided into stable increase group, and 66 countries were divided into significant increase group (Fig 3).

Age pattern
In the GBD study, the age groups are divided into 5 years.There are 21 groups from 0 to 100 plus years old.The incidence and prevalence of PCOS were only recorded between 10 to 54 years old, which can be divided into nine groups.Globally, the 10 to 14 group and the 15 to 19 group had the highest incidence counts of PCOS, and the incidence of the later age groups declined precipitously.The prevalence data showed that the counts peaked in the 20 to 24 group in 1990, while peaked in the 25 to 29 group in 2019.From 1990 to 2019, the largest difference in the prevalence changes was in the 30 to 34 group.According to the analysis of the Age-WHO region classification, only the incidence of the 10 to 14 group in the European Region and the Eastern Mediterranean Region was larger than that of the 15 to 19 group.In 2019, the prevalence of the most regions peaked in the 20 to 29 group, while only the Western Pacific Region peaked in the 30 to 34 group, and the European Region peaked in the 40 to 44 group.According to the Age-SDI region classification, the incidence of the High-middle SDI region and the Middle SDI region was higher in the 10 to 14 group than in the 15 to 19 group.The peak age groups of the Low-middle SDI region and the Low SDI region were the 20 to 24 group, while the peak age groups of the High SDI region, the High-middle SDI region and the

Annual pattern
Globally, from 1990 to 2019, the ASPR and ASIR time series data of PCOS showed an increasing trend year by year, and the changing trend of ASPR and ASIR in each region was consistent.According to the classification of WHO region, only the incidence of the African Region decreased over a period of time (from 2007 to 2010), and the prevalence and incidence of the Region of the Americas decreased over a period of time (from 1999 to 2010).According to the SDI region classification, only the prevalence and incidence of the High SDI region did not increase annually [from 2001 to 2010 (both in prevalence and incidence)].The regions classified by Commonwealth Income maintained a similar increasing trend ( Fig 5,S2

EAPC influential factors
As is shown in Fig 6, the ASR in 1990 was regarded as the baseline disease data of PCOS, the relation between ASIR and ASPR with EAPC was negatively correlated with statistical differences (t = -5.3,p<0.01 in ASIR, t = -5.4,p<0.01 in ASPR).The HDI of each country was regarded as a sign of health care level, according to the distribution of HDI, there was no significant correlation between HDI and EAPC (t = -1.7,p = 0.09 in ASIR, t = -2.0,p = 0.05 in ASPR).However, when HDI < 0.7, EAPC of incidence and prevalence was positively correlated with HDI, and when HDI > 0.7, EAPC of incidence and prevalence was negatively correlated with HDI.Countries with the middle level HDI have the highest increasing trend of ASIR and ASPR (S3 and S4 Tables in S1 File).

Prediction of the global PCOS burden
The ARIMA models of ASPR and ASIR demonstrated that the data become stationary only after three differencing operations (p = 0.01) (S3 Fig in S1 File).The ACF plot of ASPR indicated that the autocorrelation values exceed the boundary at the third lag, and the PACF plot showed that the partial autocorrelation values become stationary after the third lag.Therefore, an ARMA (3,3) model was selected.The ACF plot of ASIR showed that the autocorrelation values exceed the boundary at the third lag, and the PACF plot showed that the partial autocorrelation values become stationary after the second lag.Therefore, an ARMA (3,2) model was selected.The predicted results of the ARIMA model (S5 Table in S1 File) showed that by 2042, the ASIR and ASPR of PCOS will reach 112 (per 100000) and 3250 (per 100000), respectively.It should be noted that the series with high volatility requiring three differencing operations need to be carefully evaluated when interpreting the predicted results.
The prediction of BAPC model (Fig 7) revealed that the prevalence and incidence of PCOS would consistently increase in the next 20 years.In 2042, the predicted ASPR was 3806 (per 100000).According to the age group classification, the 35 to 39 group showed the highest prevalence of 5337 (per 100000).The predicted ASIR in 2042 was 117 (per 100000).According to the age groups, the incidence rate of the 10 to 14 group was 433 (per 100000), and the 15 to 19 group was 469 (per 100000) (S6 and S7 Tables in S1 File).

Discussion
Compared with the other publications summarizing the global burden of PCOS based on the GBD study 2019, this study added data analysis of prevalence and incidence by multiple regions and countries, age groups, and time series to explore the heterogenous sources of estimates.The Region of the Americas showed the highest ASIR and ASPR in 1990 and 2019, but the South-East Asia Region and the Western Pacific Region had the highest counts of incidence and prevalence, which was related to the total population of the regions.The least ASIR and ASPR were shown in the African Region.The other article, different from the region classification of this study, claimed that High-income Asia-Pacific, Australia, and Western Europe were the regions with the heaviest disease burden [4].The race and ethnicity differences were first considered the sources of heterogeneity.Analyses of the genomic databases revealed that the ethnic variability of PCOS was indeed determined by the human genetic background, and ethnic variations in PCOS phenotypic expression occur in all regions [16,17].For instance, body mass index (BMI) and homeostasis model assessment-insulin resistance (HOMA-IR) of Mexican-American women are higher than those of Caucasian women, and the hairiness of Asian women is significantly lower than that of Caucasian women [18].These symptoms are part of the diagnostic criteria of PCOS, so racial phenotypic variations largely determine the different characteristics in disease.
Different diagnostic criteria may lead to changes in epidemiological data.In the time series analysis, the ASIR of PCOS in Region of the Americas decreased from 1999 to 2010, which was not consistent with the prediction after the emergence of the Rotterdam diagnostic criteria.The Rotterdam criteria with two new phenotypes should have extended the definition of PCOS, and the prevalence ought to increase contrary to the existing decreasing trend.In addition to the international diagnostic criteria, some countries and territories have developed unique PCOS criteria based on the characteristics of the population.PCOS diagnostic criteria formulated by the Japanese Society of Obstetrics and Gynecology (JSOG) in 1993 and updated in 2007 [19,20].The JSOG criteria proposed the importance of LH and FSH in diagnosing PCOS, which is more suitable for the characteristics of people in eastern Asia.This may partly explain why Japan showed the highest ASIR and ASPR among the 204 countries and territories.
Previous studies have shown that there is a positive correlation between SDI and the burden of PCOS, which may be because the westernized diet is more popular in developed countries, which is closely related to the risk factors of PCOS, such as obesity and insulin resistance [4].In this study, SDIs were divided into five grades, although the ASPR of the High SDI region was the highest in 1990, the ASIR and ASPR of the Middle SDI region were both the highest in 2019.If the regions were classified into three categories according to commonwealth income, the ASIR and ASPR of the middle level regions were the highest.Therefore, it is inaccurate to simply conclude that high SDI represents a high PCOS burden.In order to further explore the correlation between the two, EAPC and HDI were applied to the correlation analysis.HDI is composed of three indicators: life expectancy, adult literacy rate, and logarithm of GDP per capita [21,22].Our results demonstrated that when HDI was < 0.7, EAPC and HDI were positively related, while in the case of HDI > 0.7, the relationship between them was reversed.First of all, compared with the low HDI regions, the average life expectancy of residents in the middle HDI regions has increased more [23].As a chronic disease, PCOS patients have a longer survival period, which may lead to an increase in prevalence.In the process of population aging, industrialization, and urbanization, inevitably deteriorating social and environmental factors also have a significant impact on the incidence of PCOS.Individuals in the middle HDI regions may maintain higher behavioral risk factors, such as unsuitable eating habits leading to excessive nutrition, carbohydrate, and fat intake exceeding consumption for a long time [24,25].Poor living habits and lack of exercise are also risk factors for PCOS and metabolic diseases [26].
From the perspective of age pattern, the 15 to 19 group had the highest incidence of PCOS.Only in the European Region and the Eastern Mediterranean Region, this contrast was reversed.Because PCOS is a hormone-related disease, which mainly occurs in women of childbearing age, this may be explained by the characteristics of the age of sexual maturity in different regions [27].From the prevalence analysis, the concentration of PCOS age groups has changed from the 20 to 24 group in 1990 to the 25 to 19 group in 2019 globally, which can be explained by the mentioned increase in life expectancy and population aging.Prevalence of PCOS in most regions peaked in the 20 to 29 group and declined slowly, which is related to the active treatment and the age-incidence rate of PCOS.The European Region showed a different age-prevalence pattern in 2019.The counts of PCOS patients increased steadily from the age group of 10 to 14 to the 40 to 44 group.This was closely related to the wave of European immigrants and the formation of an aging society.According to the Eurostat data in 2019, the population aged 65 and over in 27 EU countries reached 90.5 million, accounting for 20.3% of the total population [28].
This study has the following limitations.Firstly, the DisModMR model of GBD only considers the NIH criteria as the reference standard for diagnosing PCOS, ignoring some local definitions, which may lead to heterogeneity in epidemiological data.In particular, the NIH criteria rely heavily on the presence of oligo-ovulation and hyperandrogenism, which may not be suitable for some populations.As a result, the prevalence of PCOS may be underestimated or overestimated, which affects the accuracy of burden estimates.Secondly, when GBD collected and collated regional data, it was inevitable that the regional data was not completely obtained, and it was difficult to implement regional analysis within a country.For example, some studies may not provide regional data, while others may provide data that is not consistent with the GBD's regional classification.Therefore, the regional burden of PCOS may not be accurately estimated, which affects the effectiveness of targeted intervention and resource allocation.Finally, the risk factors of PCOS were not found in the GBD study database.PCOS is a multifactorial disorder, and many factors have been implicated in its pathogenesis, including genetic, environmental, and lifestyle factors.The study may not provide a comprehensive understanding of the risk factors of PCOS, which hinders the development of effective prevention and control strategies.

Conclusion
This study revealed that the incidence, prevalence, and YLDs of PCOS were increasing and this trend would maintain in the next 20 years.The social and economic development is not fully positively related to the PCOS burden, and the burden is highest in medium regions.Women's health infrastructure should be strengthened to deal with potential PCOS patients in the future.The highest incidence of PCOS is from 10 to 19 years old, which points out that the government should pay attention to the importance of early screening in adolescents.

Fig 1 .
Fig 1. Global and regional burden of PCOS from 1990 to 2019.(A) global and regional estimates of ASIR from 1990 to 2019.(B) global and regional estimates of ASPR from 1990 to 2019.(C) global and regional estimates of YLDs ASR from 1990 to 2019.(D) global and regional EAPC of incidence, prevalence and YLDs.https://doi.org/10.1371/journal.pone.0306991.g001 territories with negative EAPC estimates, among which the United States of America had the lowest.Zimbabwe and the United States of America were the only two countries with increased prevalence and negative EAPC estimates (Fig 2, S2 Fig in S1 File).

Fig 7 .
Fig 7. BAPC prediction of PCOS burden in the next 20 years.(A) ASIR and incidence rate of age groups from 1990 to 2042.(B) ASPR and prevalence rate of age groups from 1990 to 2042.(C) Estimates of ASIR and ASPR in 2042.https://doi.org/10.1371/journal.pone.0306991.g007

Table in S1
File).
Table in S1 File).